4 research outputs found

    Infrastructure for Performance Monitoring and Analysis of Systems and Applications

    Get PDF
    The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. HPC performance monitoring tools need to collect information at both the application and system levels to yield a complete performance picture. Existing approaches limit the abilities of the users to do meaningful analysis on actionable timescale. Efficient infrastructures are required to support largescale systems performance data analysis for both run-time troubleshooting and post-run processing modes. In this dissertation, we present methods to fill these gaps in the infrastructure for HPC performance monitoring and analysis. First, we enhance the architecture of a monitoring system to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. Next, we present an approach to streaming collection of application performance data. We integrate these methods with a monitoring system used on large-scale computational platforms. Finally, we present a new approach for constructing durable transactional linked data structures that takes advantage of byte-addressable non-volatile memory technologies. Transactional data structures are building blocks of in-memory databases that are used by HPC monitoring systems to store and retrieve data efficiently. We evaluate the presented approaches on a series of case studies. The experiment results demonstrate the impact of our tools, while keeping the overhead in an acceptable margin

    A Methodology For Performance Analysis Of Non-Blocking Algorithms Using Hardware And Software Metrics

    No full text
    Non-blocking algorithms are a class of algorithms that provide guarantees of progress within a system. These progress guarantees come from the fine-grained synchronization techniques incorporated into their design. There are a number of various non-blocking designs and implementations of concurrent algorithms. However, trade-offs between performance and non-blocking algorithm design decisions are poorly understood. The most common method to measure the performance of non-blocking algorithms is to analyze the number of operations completed over a period of time. Unfortunately, this coarse-grained approach for performance analysis is unable to capture and explain many of the nuances of the behavior of non-blocking algorithms. This can result in a flawed perception of such algorithms, which may lead to a misguided use of them. This work provides a fine-grained approach for the analysis of the design and use of non-blocking algorithms. To support this analysis, we introduce a methodology that enables a user to simulate an application\u27s use of an arbitrary non-blocking algorithm and extract insightful information from the performance results. To better understand the behavior of non-blocking algorithms, we present metrics to measure the effectiveness of the key synchronization and memory management techniques used in non-blocking algorithms. Our analysis combines these new metrics with several well-known hardware metrics to explain key behaviors and develop new insights. To demonstrate the effectiveness of the proposed methodology, we integrate it within the Tervel framework and analyzed Tervel\u27s vector in various use cases. Our experiments show that helping mechanisms negatively impact throughput by increasing misaligned data cache accesses. Furthermore, by studying the correlations between different metrics, we are able to observe the effect of thread interference on the CPU instructions and instruction cache invalidation, and then link the decrease in work completed to this effect. In addition to the provided information, these metrics revealed implementation errors that did not affect correctness but caused increased thread congestion

    An Efficient Latch-Free Database Index Based On Multi-Dimensional Lists

    No full text
    In the interests of improving database performance, researchers have considered lock-free data structures for their attractive progress guarantees and scalability. This paper considers the performance of a recently developed lock-free structure, multi-dimensional list (MDList), used as a database index in SOS, a high-performance, object-oriented database. In our tests, we find that MDList outperforms the existing locking structures in multi-threaded workloads. This is the first known use of MDList as an index structure in databases

    Integrating Low-Latency Analysis Into Hpc System Monitoring

    No full text
    The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. While raw performance data is increasingly exposed at the component level, the usefulness of the data is dependent on the ability to do meaningful analysis on actionable timescales. However, current system monitoring infrastructures largely focus on data collection, with analysis performed off-system in post-processing mode. This increases the time required to provide analysis and feedback to a variety of consumers. In this work, we enhance the architecture of a monitoring system used on large-scale computational platforms, to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. We leverage the flexible communication topology of the monitoring system to enable placement of transformations based on overhead concerns, while still enabling low-latency exposure on node. Our design internally supports and exposes the raw and transformed data uniformly for both node level and off-system consumers. We show the viability of our implementation for a case with production-relevance: run-time determination of the relative per-node files system demands
    corecore